Identifying Person Duplicates of Short Geographic Distance by Computer Matching
نویسنده
چکیده
The Census Bureau conducted evaluations of person duplication in Census 2000. Duplicates of short geographic distances were identified by both clerical and computer matching. The evaluations showed that for these short distance duplicates that the computer matching algorithms were not able to find all of the duplicates identified by the clerks. However, the computer matching algorithms in the previous evaluations were primarily developed to identify duplicates of longer distances. This report analyzes the potential of computer matching when the focus is on short distance duplicates. I used the Bureau's record linkage software to do the computer matching. Using SAS, I was able to compare the computer matching results to the clerical results. First, I attempted to identify groups of links with high concentrations of true duplicates. I used Enterprise Miner to generate decision trees for several approaches and compared their results. Second, I analyzed clerical duplicates that were not identified by the computer matching to try to identify any patterns in these cases.
منابع مشابه
Effective and Efficient XML Duplicate Detection Using Levenshtein Distance Algorithm
There is big amount of work on discovering duplicates in relational data; merely elite findings concentrate on duplication in additional multifaceted hierarchical structures. Electronic information is one of the key factors in several business operations, applications, and determinations, at the same time as an outcome, guarantee its superiority is necessary. Duplicates are several delegacy of ...
متن کاملAdaptive Approximate Record Matching
Typographical data entry errors and incomplete documents, produce imperfect records in real world databases. These errors generate distinct records which belong to the same entity. The aim of Approximate Record Matching is to find multiple records which belong to an entity. In this paper, an algorithm for Approximate Record Matching is proposed that can be adapted automatically with input error...
متن کاملA procedure for Web Service Selection Using WS-Policy Semantic Matching
In general, Policy-based approaches play an important role in the management of web services, for instance, in the choice of semantic web service and quality of services (QoS) in particular. The present research work illustrates a procedure for the web service selection among functionality similar web services based on WS-Policy semantic matching. In this study, the procedure of WS-Policy publi...
متن کاملLearning to Combine Trained Distance Metrics for Duplicate Detection in Databases
The problem of identifying approximately duplicate records in databases has previously been studied as record linkage, the merge/purge problem, hardening soft databases, and field matching. Most existing approaches have focused on efficient algorithms for locating potential duplicates rather than precise similarity metrics for comparing records. In this paper, we present a domain-independent me...
متن کاملMatching of Polygon Objects by Optimizing Geometric Criteria
Despite the semantic criteria, geometric criteria have different performances on polygon feature matching in different vector datasets. By using these criteria for measuring the similarity of two polygons in all matchings, the same results would not have been obtained. To achieve the best matching results, the determination of optimal geometric criteria for each dataset is considered necessary....
متن کامل